Provenance Information in a Collaborative Knowledge Graph: An Evaluation of Wikidata External References
نویسندگان
چکیده
Wikidata is a collaboratively-edited knowledge graph; it expresses knowledge in the form of subject-property-value triples, which can be enhanced with references to add provenance information. Understanding the quality of Wikidata is key to its widespread adoption as a knowledge resource. We analyse one aspect of Wikidata quality, provenance, in terms of relevance and authoritativeness of its external references. We follow a two-staged approach. First, we perform a crowdsourced evaluation of references. Second, we use the judgements collected in the first stage to train a machine learning model to predict reference quality on a large-scale. The features chosen for the models were related to reference editing and the semantics of the triples they referred to. 61% of the references evaluated were relevant and authoritative. Bad references were often links that changed and either stopped working or pointed to other pages. The machine learning models outperformed the baseline and were able to accurately predict non-relevant and nonauthoritative references. Further work should focus on implementing our approach in Wikidata to help editors find bad references.
منابع مشابه
Opportunities and Challenges Presented by Wikidata in the Context of Biocuration
Wikidata is a world readable and writable knowledge base maintained by the Wikimedia Foundation. It offers the opportunity to collaboratively construct a fully open access knowledge graph spanning biology, medicine, and all other domains of knowledge. To meet this potential, social and technical challenges must be overcome many of which are familiar to the biocuration community. These include c...
متن کاملWembedder: Wikidata entity embedding web service
I present a web service for querying an embedding of entities in the Wikidata knowledge graph. The embedding is trained on the Wikidata dump using Gensim’s Word2Vec implementation and a simple graph walk. A REST API is implemented. Together with the Wikidata API the web service exposes a multilingual resource for over 600’000 Wikidata items and properties.
متن کاملThe Call for Recall
General-purpose knowledge bases (KBs) such as YAGO, Wikidata or the Google Knowledge Graph usually contain facts that have a high precision. In contrast, little is known about the recall of such KBs, and anecdotal evidence indicates that knowledge bases have a low recall on many topics. This project aims to develop techniques to evaluate and improve the recall of knowledge bases. We aim to use ...
متن کاملFrom Freebase to Wikidata: The Great Migration
Collaborative knowledge bases that make their data freely available in a machine-readable form are central for the data strategy of many projects and organizations. The two major collaborative knowledge bases are Wikimedia’s Wikidata and Google’s Freebase. Due to the success of Wikidata, Google decided in 2014 to offer the content of Freebase to the Wikidata community. In this paper, we report ...
متن کاملClassification of Knowledge Organization Systems with Wikidata
This paper presents a crowd-sourced classification of knowledge organization systems based on open knowledge base Wikidata. The focus is less on the current result in its rather preliminary form but on the environment and process of categorization in Wikidata and the extraction of KOS from the collaborative database. Benefits and disadvantages are summarized and discussed for application to kno...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017